AITopics | pyramidal topology

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Neural Information Processing SystemsDec-24-2025, 06:46:12 GMT

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with N (N being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width N following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used Xavier's initialization and obtain an over-parameterization requirement for the single wide layer of order N^2.

deep network, global convergence, wide layer followed, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology Quynh Nguyen

Neural Information Processing SystemsAug-15-2025, 01:21:54 GMT

Understanding this phenomenon has recently attracted a lot of interest within the research community.

gradient descent, international conference, neural network, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Neural Information Processing SystemsMay-27-2025, 05:17:07 GMT

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with N (N being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width N following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used Xavier's initialization and obtain an over-parameterization requirement for the single wide layer of order N 2.

artificial intelligence, machine learning, wide layer followed, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Neural Information Processing SystemsJan-26-2025, 10:54:48 GMT

This paper shows global convergence of gradient descent for deep neural networks that has wide first layer followed by pyramidal shape layers. It shows that an unconventional initialization with width N (data size) of the first layer suffices to show global convergence, which is much smaller than the required width for usual Xavier initialization. The presented result improves existing results greatly; the global convergence for width N is a great improvement from existing results. That is a valuable result. It is encouraged to add more detailed discussions about connection to existing NTK theories and possibilities of relaxing the assumptions maid in the analysis.

global convergence, pyramidal topology, wide layer followed, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)

Add feedback

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Neural Information Processing SystemsOct-10-2024, 17:50:10 GMT

Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with N (N being the number of training samples). In this paper, we prove that, for deep networks, a single layer of width N following the input layer suffices to ensure a similar guarantee. In particular, all the remaining layers are allowed to have constant widths, and form a pyramidal topology. We show an application of our result to the widely used Xavier's initialization and obtain an over-parameterization requirement for the single wide layer of order N 2.

global convergence, pyramidal topology, wide layer followed, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Nguyen, Quynh, Mondelli, Marco

arXiv.org Machine LearningFeb-18-2020

A recent line of research has provided convergence guarantees for gradient descent algorithms in the excessive over-parameterization regime where the widths of all the hidden layers are required to be polynomially large in the number of training samples. However, the widths of practical deep networks are often only large in the first layer(s) and then start to decrease towards the output layer. This raises an interesting open question whether similar results also hold under this empirically relevant setting. Existing theoretical insights suggest that the loss surface of this class of networks is well-behaved, but these results usually do not provide direct algorithmic guarantees for optimization. In this paper, we close the gap by showing that one wide layer followed by pyramidal deep network topology suffices for gradient descent to find a global minimum with a geometric rate. Our proof is based on a weak form of Polyak-Lojasiewicz inequality which holds for deep pyramidal networks in the manifold of full-rank weight matrices.

deep network, global convergence, wide layer followed, (12 more...)

arXiv.org Machine Learning

2002.07867

Country:

Europe > Germany > Saarland (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

pyramidal topology

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology Quynh Nguyen

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Review for NeurIPS paper: Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology

Global Convergence of Deep Networks with One Wide Layer Followed by Pyramidal Topology